home *** CD-ROM | disk | FTP | other *** search
- WNILS Working Group Chris Weider
- INTERNET-DRAFT Merit Network, Inc.
- Jim Fullton
- UNC Chapel Hill
- Simon Spero
- 11/10/92 UNC Chapel Hill
-
-
- Architecture of the Whois++ Index Service
-
- Status of this memo:
-
- The authors describe an archtecture for indexing in distributed databases,
- and apply this to the WHOIS++ protocol.
-
-
- This document is an Internet Draft. Internet Drafts are working
- documents of the Internet Engineering Task Force (IETF), its Areas,
- and its Working Groups. Note that other groups may also distribute
- working documents as Internet Drafts.
-
- Internet Drafts are draft documents valid for a maximum of six
- months. Internet Drafts may be updated, replaced, or obsoleted
- by other documents at any time. It is not appropriate to use
- Internet Drafts as reference material or to cite them other than
- as a "working draft" or "work in progress."
-
- Please check the I-D abstract listing contained in each Internet
- Draft directory to learn the current status of this or any
- other Internet Draft.
-
- This Internet Draft expires May 10, 1993.
-
- 1. Purpose:
-
- The WHOIS++ directory service [GDS, 1992] is intended to provide
- a simple, extensible directory service predicated on a template-based
- information model and a flexible query language. This document describes
- an architecture designed to link together many of these WHOIS++ servers
- into a distributed, searchable wide area directory service.
-
- 2. Scope:
-
- This document details a distributed, easily maintained architecture for
- providing a unified index to a large number of distributed WHOIS++
- servers. This architecture can be used with systems other than WHOIS++ to
- provide a distributed directory service which is also searchable.
-
- 3. Motivation and Introduction:
-
- It seems clear that with the vast amount of directory information potentially
- available on the Internet, it is simply unfeasible to build a centralized
- directory to serve all this information. Therefore, we should look at building
- a distributed directory service. If we are to distribute the directory service,
- the easiest (although not necessarily the best) way of building the directory
- service is to build a hierarchy of directory information collection agents.
- In this architecture, a directory query is delivered to a certain agent
- in the tree, and then handed up or down, as appropriate, so that the query
- is delivered to the agent which holds the information which fills the query.
- This approach has been tried before, most notably in some implementations of
- the X.500 standard. However, there are two major flaws with the approach
- as it has been taken. This new Index Service is designed to fix these flaws.
-
- 3.1 The search problem
-
- Current implementations of this hierarchical architecture require that a search
- query issued at a certain location in the directory agent tree be replicated
- to _all_ subtrees, because there is no way to tell which subtrees might
- contain the desired information. It is obvious that this has rather extreme
- scaling problems, and in fact the search facility has been turned off in the
- X.500 architecture because of this problem. Our new WHOIS++ architecture
- solves this problem by having a set of 'forward information' at each level
- of the tree. That is, each level of the tree has some idea of where to look
- lower in the tree to find the requested information. Consequently, the
- search tree can be pruned enormously, making search feasible at all levels
- of the tree. We have chosen a certain set of information to hand up the
- tree as forward information; this may or may not be exactly the set of
- information required to build a truly searchable directory. However, it seems
- clear that without some sort of forward information, the search problem
- becomes intractable.
-
- 3.2 The location problem
-
- Current implementations of this hierarchical architecture also encode details
- about the directory agent hierarchy in the location information for a specific
- entry. With search turned off, this requires a user to know exactly how
- the hierarchy of servers is laid out and how they are named, which leads to
- acrimonious debate about the shape of the name space and really massive
- headaches whenever it becomes apparant that the current namespace is unsuited
- to the current usages and must be changed. The new Index Service gets around
- this by a) not enforcing a true hierarchy on the directory agents, b)
- dissociating the directory service from the information served, and c)
- allowing new hierarchies to be built whenever necessary, without destroying
- the hierarchies already in place. Thus a user does not need to know in
- advance where in the hierarchy the information served is contained, and the
- information a user enters to guide the search does not ever have to explicitly
- show up in the hierarchy. Although there are provisions in the WHOIS++
- query syntax to watch the directory service as it hand the query around, and
- consequently to divine the structure of the directory service hierarchy,
- it really is not relevant to the user, and does not ever have to be taken
- into consideration.
-
- 3.3 The Yellow Pages problem
-
- Current implementations of this hierarchical architecture have also been
- unsuited to solving the Yellow Pages problem; that is, the problem of
- easily and flexibly building special-purpose directories (say of
- molecular biologists) and of automatically maintaining these directories
- once they have been built. In particular with the current systems, one has
- to build into the name space the attributes appropriate to the new directory.
- Since our new Index Service very easily allows directory servers to pick and
- choose between information proffered by a given entry server, and because we
- have an architecture which allows for automatic polling of data, Yellow
- Pages capabilities fall very naturally out of the design. Although the
- ability to search all levels of the tree(s) gets us a long way towards the
- Yellow Pages, it is this capacity to locate, gather, and maintain information
- in a distributed and selective way that really solves the problem.
-
-
- 4. Components of the Index Service:
-
- 4.1 WHOIS++ servers
-
- The whois++ service is described in [GDS, 1992]. As that service specifies
- only the query language, the information model, and the server responses,
- whois++ services can be provided by a wide variety of databases and directory
- services. However, to participate in the Index Service, that underlying
- database must also be able to generate a 'centroid' for the data it serves.
-
- 4.2 Centroids as forward knowledge
-
- The centroid of a server is comprised of a list of the templates and
- attributes used by that server, and a word list for each attribute.
- The word list for a given attribute contains one occurrence of every
- word which appears at least once in that attribute in some record in that
- server's data, and nothing else.
-
- For example, if a whois++ server contains exactly three records, as follows:
-
- Record 1 Record 2
- Template: User Template: User
- First Name: John First Name: Joe
- Last Name: Smith Last Name: Smith
- Favourite Drink: Labatt Beer Favourite Drink: Molson Beer
-
- Record 3
- Template: Domain
- Domain Name: foo.edu
- Contact Name: Mike Foobar
-
- the centroid for this server would be
-
- Template: User
- First Name: Joe
- John
- Last Name: Smith
- Favourite Drink: Beer
- Labatt
- Molson
-
- Template: Domain
- Domain Name: foo.edu
- Contact Name: Mike
- Foobar
-
- It is this information which is handed up the tree to provide forward knowledge.
- As we mention above, this may not turn out to be the ideal solution for
- forward knowledge, and we suspect that there may be a number of different
- sets of forward knowledge used in the Index Service. However, the directory
- architecture is in a very real sense independent of what types of forward
- knowledge are handed around, and it is entirely possible to build a
- unified directory which uses many types of forward knowledge.
-
-
- 4.3 Index servers and Index server Architecture
-
- A whois++ index server collects and collates the centroids (or other forward
- knowledge) of either a number of whois++ servers or of a number of other index
- servers. An index server must be able to generate a centroid for the
- information it contains.
-
- 4.3.1 Queries to index servers
-
- An index server will take a query in standard whois++ format, search its
- collections of centroids, determine which servers hold records which may fill
- that query, and then forward the query to the appropriate servers.
-
- 4.3.2 Index server distribution model and centroid propogation
-
- The diagram below illustrates how a tree of index servers is created for
- a set of whois++ servers.
-
- whois++ index index
- servers servers servers
- for for
- _______ whois++ lower-level
- | | servers index servers
- | A |__
- |_______| \ _______
- \----------| |
- _______ | D |__ ______
- | | /----------|_______| \ | |
- | B |__/ \----------| |
- |_______| | F |
- /----------|______|
- /
- _______ _______ /
- | | | |-
- | C |--------------| E |
- |_______| |_______|
-
-
- In the portion of the index tree shown above, whois++ servers A and B hand their
- centroids up to index server D, whois++ server C hands its centroid up to
- index server E, and index servers D and E hand their centroids up to index
- server F.
-
- The number of levels of index servers, and the number of index servers at each
- level, will depend on the number of whois++ servers deployed, and the response
- time of individual layers of the server tree. These numbers will have to
- be determined in the field.
-
- 4.3.4 Centroid propogation and changes to centroids
-
- Centroid propogation is initiated by an authenticated POLL command (sec. 4.2).
- The format of the POLL command allows the poller to request the centroid of
- any or all templates and attributes held by the polled server. After the
- polled server has authenticated the poller, it determines which of the
- requested centroids the poller is allowed to request, and then issues a
- CENTROID-CHANGES report (sec. 4.3) to transmit the data. When the poller
- receives the CENTROID-CHANGES report, it can authenticate the pollee to
- determine whether to add the centroid changes to its data. Additionally, if
- a given pollee knows what pollers hold centroids from the pollee, it can
- signal to those pollers the fact that its centroid has changed by issuing
- a DATA-CHANGED command. The poller can then determine if and when to
- issue a new POLL request to get the updated information. The DATA-CHANGED
- command is included in this protocol to allow 'interactive' updating of
- critical information.
-
- 4.3.5 Query handling and passing algorithm
-
- When an index server receives a query, it searches its collection of centroids,
- and determines which servers hold records which may fill that query. As
- whois++ becomes widely deployed, it is expected that some index servers
- may specialize in indexing certain whois++ templates or perhaps even
- certain fields within those templates. If an index server obtains a match
- with the query _for those template fields and attributes the server indexes_,
- it is to be considered a match for the purpose of forwarding the query.
- When the index server has completed its search to match the query to a
- server, it then forwards the request as shown in 5.4.
-
- Each server in the chain can then use the authentication information
- included in the FORWARDED-QUERY command to determine whether to continue
- forwarding the query.
-
- Also, a whois++ query can specify the 'trace' option, which sends to
- the user a string containing the IANA handle and an identification
- string for each index server the query is handed to.
-
- 5. Syntax for operations of the Index Service:
-
- 5.1 Data changed syntax
-
- The data changed template look like this:
-
- DATA-CHANGED:
- Version-number: // version number of index service software, used to insure
- // compatibility
- Time-of-latest-centroid-change: // time stamp of latest centroid change, GMT
- Time-of-message-generation: // time when this message was generated, GMT
- Server-handle: // IANA unique identifier for this server
- Best-time-to-poll: // For heavily used servers, this will identify when
- // the server is likely to be lightly loaded
- // so that response to the poll will be speedy, GMT
- Authentication-type: // Type of authentication used by server, or NONE
- Authentication-data: // data for authentication
- END DATA-CHANGED // This line must be used to terminate the data changed
- // message
-
- 5.2 Polling syntax
-
- POLL:
- Version-number: // version number of poller's index software, used to
- // insure compatibility
- Start-time: // give me all the centroid changes starting at this time, GMT
- End-time: // ending at this time, GMT
- Template: // a standard whois++ template name, or the keyword ALL, for a
- // full update.
- Field: // used to limit centroid update information to specific fields,
- // is either a specific field name, a list of field names,
- // or the keyword ALL
- Server-handle: // IANA unique identifier for the polling server.
- // this handle may optionally be cached by the polled
- // server to announce future changes
- Authentication-type: // Type of authentication used by poller, or NONE
- Authentication-data: // Data for authentication
- END POLL // This line must by used to terminate the poll message
-
- 5.3 Centroid change report
-
- CENTROID-CHANGES:
- Version-number: // version number of pollee's index software, used to
- // insure compatibility
- Start-time: // change list starting time, GMT
- End-time: // change list ending time, GMT
- Server-handle: // IANA unique identifier of the responding server
- Authentication-type: // Type of authentication used by pollee, or NONE
- Authentication-data: // Data for authentication
- Compression-type: // Type of compression used on the data, or NONE
- Size-of-compressed-data: // size of compressed data if compression is used
- Operation: // One of 3 keywords: ADD, DELETE, FULL
- // ADD - add these entries to the centroid for this server
- // DELETE - delete these entries from the centroid of this
- // server
- // FULL - the full centroid as of end-time follows
- Multiple occurrences of the following block of fields:
- Template: // a standard whois++ template name
- Field: // a field name within that template
- Data: // the word list itself, one per line, cr/lf terminated
- end of multiply repeated block
- END CENTROID-CHANGES // This line must be used to terminate the centroid
- // change report
-
- 5.4 Forwarded query
-
- FORWARDED-QUERY:
- Version-number: // version number of forwarder's index software, used to
- // insure compatibility
- Forwarded-From: // IANA unique identifier of the server forwarding query
- Forwarded-time: // time this query forwarded, GMT (used for debugging)
- Trace-option: // YES if query has 'trace' option listed, NO if not.
- // used at message reception time to generate trace information
- Query-origination-address: // address of origin of query
- Body-of-Query: // The original query goes here
- Authentication-type: // Type of authentication used by queryer
- Authentication-data: // Data for authentication
- END FORWARDED-QUERY // This line must be used to terminate the body of the
- // query
-
- 6 Author's Addresses
-
- Chris Weider
- clw@merit.edu
- Industrial Technology Institute, Pod G
- 2901 Hubbard Rd,
- Ann Arbor, MI 48105
- O: (313) 747-2730
- F: (313) 747-3185
-
- Jim Fullton
- fullton@mdewey.ga.unc.edu
- 310 Wilson Library CB #3460
- University of North Carolina
- Chapel Hill, NC 27599-3460
- O: (919) 962-9107
- F: (919) 962-5604
-
- Simon Spero
- ses@sunsite.unc.edu
- 310 Wilson Library CB #3460
- University of North Carolina
- Chapel Hill, NC 27599-3460
- O: (919) 962-9107
- F: (919) 962-5604
-
-